Goto

Collaborating Authors

 predictive variance


Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

Franzen, Courtney, Pourkamali-Anaraki, Farhad

arXiv.org Machine Learning

Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.


SupplementaryMaterial

Neural Information Processing Systems

Letπ0( |s)beaGaussianbehavioral reference policy with meanµ0(s) and variance σ20(s), and let π( |s) be an online policy with reparameterization at = fφ( t;st)andrandomvector t. Whilstentropyregularization partially mitigates the collapse of predictive variance away from the expert demonstrations, we still observe the wrong trend similar to Figure 1 with predictive variances high near the expert demonstrations andlowonunseen data. AWAC performs online fine-tuning of a policy pre-trained on offline. Themethod requires additional off-policy data to be generated to saturate the replay buffer, thereby requiring ahidden number ofenvironment interactions that donotinvolvelearning. To mitigate this, in practice, BRAC adds an entropy bonus to the supervised learning objective which stabilizes the variance around the training set but has no guarantees away from thedata.






2 Neuralnetworkensemblesandtheirrelationstokernels

Neural Information Processing Systems

Although the ongoing success of deep learning is remarkable, the increasing data, model and training algorithm complexity makeathorough understanding oftheir inner workings increasingly difficult.


sup

Neural Information Processing Systems

C.1 2DSyntheticBenchmark For both benchmarks, we sample 500 observationsxi=(x1i,x2i)from each of the twoin-domain classes (orange and blue), and consider a deep architecture ResFFN-12-128, which contains 12 residual feedforward layers with 128 hidden units and dropout rate 0.01.


Multi-level Monte Carlo Dropout for Efficient Uncertainty Quantification

Pim, Aaron, Pryer, Tristan

arXiv.org Machine Learning

We develop a multilevel Monte Carlo (MLMC) framework for uncertainty quantification with Monte Carlo dropout. Treating dropout masks as a source of epistemic randomness, we define a fidelity hierarchy by the number of stochastic forward passes used to estimate predictive moments. We construct coupled coarse--fine estimators by reusing dropout masks across fidelities, yielding telescoping MLMC estimators for both predictive means and predictive variances that remain unbiased for the corresponding dropout-induced quantities while reducing sampling variance at fixed evaluation budget. We derive explicit bias, variance and effective cost expressions, together with sample-allocation rules across levels. Numerical experiments on forward and inverse PINNs--Uzawa benchmarks confirm the predicted variance rates and demonstrate efficiency gains over single-level MC-dropout at matched cost.


Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Neural Information Processing Systems

Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.